Introduction

R is an open source programming language used for data analysis and statistical computing and graphics. R syntax consists of variables, comments, and keywords. It was developed in 1993 and is compatible with Windows, Macintosh, UNIX, and Linux platforms.

Random Notes:

  • RStudio is like the “VS Code” for R
  • RMarkdown is like the “Jupyter Notebook” for R

Some of the Pros

  • Open Source
  • Platform-independent
  • LOTs of packages (> 10,000)
  • Great for statistics
  • Good for machine learning
  • Data wrangling
  • Data visualization
  • Still growing

Some of the Cons

  • Complicated language…apparently
  • Not as secure (may not be the safest for web applications?)
  • Slow (slower than Python)
  • Takes up a lot of memory
  • Sometimes inconsistent documentation/package quality

Packages

Before writing any R code you must first import the needed libraries.

library(tidyverse)
library(simplevis)
library(dplyr)
library(palmerpenguins)
library(sf)
library(leaflet)
library(plotly)

Tidyverse

Tidyverse is a library with lots of useful functions for data wrangling. Here are a few examples.

Filter

starwars %>% 
  filter(species == "Droid")
## # A tibble: 6 × 14
##   name   height  mass hair_color skin_color eye_c…¹ birth…² sex   gender homew…³
##   <chr>   <int> <dbl> <chr>      <chr>      <chr>     <dbl> <chr> <chr>  <chr>  
## 1 C-3PO     167    75 <NA>       gold       yellow      112 none  mascu… Tatooi…
## 2 R2-D2      96    32 <NA>       white, bl… red          33 none  mascu… Naboo  
## 3 R5-D4      97    32 <NA>       white, red red          NA none  mascu… Tatooi…
## 4 IG-88     200   140 none       metal      red          15 none  mascu… <NA>   
## 5 R4-P17     96    NA none       silver, r… red, b…      NA none  femin… <NA>   
## 6 BB8        NA    NA none       none       black        NA none  mascu… <NA>   
## # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
## #   starships <list>, and abbreviated variable names ¹​eye_color, ²​birth_year,
## #   ³​homeworld

Select

starwars %>% 
  select(name, ends_with("color"))
## # A tibble: 87 × 4
##    name               hair_color    skin_color  eye_color
##    <chr>              <chr>         <chr>       <chr>    
##  1 Luke Skywalker     blond         fair        blue     
##  2 C-3PO              <NA>          gold        yellow   
##  3 R2-D2              <NA>          white, blue red      
##  4 Darth Vader        none          white       yellow   
##  5 Leia Organa        brown         light       brown    
##  6 Owen Lars          brown, grey   light       blue     
##  7 Beru Whitesun lars brown         light       blue     
##  8 R5-D4              <NA>          white, red  red      
##  9 Biggs Darklighter  black         light       brown    
## 10 Obi-Wan Kenobi     auburn, white fair        blue-gray
## # … with 77 more rows

Mutate

starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)
## # A tibble: 87 × 4
##    name               height  mass   bmi
##    <chr>               <int> <dbl> <dbl>
##  1 Luke Skywalker        172    77  26.0
##  2 C-3PO                 167    75  26.9
##  3 R2-D2                  96    32  34.7
##  4 Darth Vader           202   136  33.3
##  5 Leia Organa           150    49  21.8
##  6 Owen Lars             178   120  37.9
##  7 Beru Whitesun lars    165    75  27.5
##  8 R5-D4                  97    32  34.0
##  9 Biggs Darklighter     183    84  25.1
## 10 Obi-Wan Kenobi        182    77  23.2
## # … with 77 more rows

Arrange

starwars %>% 
  arrange(desc(mass))
## # A tibble: 87 × 14
##    name        height  mass hair_…¹ skin_…² eye_c…³ birth…⁴ sex   gender homew…⁵
##    <chr>        <int> <dbl> <chr>   <chr>   <chr>     <dbl> <chr> <chr>  <chr>  
##  1 Jabba Desi…    175  1358 <NA>    green-… orange    600   herm… mascu… Nal Hu…
##  2 Grievous       216   159 none    brown,… green,…    NA   male  mascu… Kalee  
##  3 IG-88          200   140 none    metal   red        15   none  mascu… <NA>   
##  4 Darth Vader    202   136 none    white   yellow     41.9 male  mascu… Tatooi…
##  5 Tarfful        234   136 brown   brown   blue       NA   male  mascu… Kashyy…
##  6 Owen Lars      178   120 brown,… light   blue       52   male  mascu… Tatooi…
##  7 Bossk          190   113 none    green   red        53   male  mascu… Trando…
##  8 Chewbacca      228   112 brown   unknown blue      200   male  mascu… Kashyy…
##  9 Jek Tono P…    180   110 brown   fair    blue       NA   male  mascu… Bestin…
## 10 Dexter Jet…    198   102 none    brown   yellow     NA   male  mascu… Ojom   
## # … with 77 more rows, 4 more variables: species <chr>, films <list>,
## #   vehicles <list>, starships <list>, and abbreviated variable names
## #   ¹​hair_color, ²​skin_color, ³​eye_color, ⁴​birth_year, ⁵​homeworld

Summarise

starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(
    n > 1,
    mass > 50
  )
## # A tibble: 8 × 3
##   species      n  mass
##   <chr>    <int> <dbl>
## 1 Droid        6  69.8
## 2 Gungan       3  74  
## 3 Human       35  82.8
## 4 Kaminoan     2  88  
## 5 Mirialan     2  53.1
## 6 Twi'lek      2  55  
## 7 Wookiee      2 124  
## 8 Zabrak       2  80

GGPlot / Plotly

Point Chart

ggplot(starwars, aes(height, mass)) + 
  geom_point()

starwars2 <- filter(starwars, name != "Jabba Desilijic Tiure")
ggplot(starwars2, aes(height, mass, colour = species)) + 
  geom_point()
## Warning: Removed 28 rows containing missing values (geom_point).

Histogram / Density

ggplot(data = starwars, mapping = aes(x = height)) +
  geom_histogram(binwidth = 10)
## Warning: Removed 6 rows containing non-finite values (stat_bin).

ggplot(data = starwars, mapping = aes(x = height)) +
  geom_density()
## Warning: Removed 6 rows containing non-finite values (stat_density).

Bar Plot

ggplot(data = starwars, mapping = aes(x = gender, fill = hair_color)) +
  geom_bar(position = "fill") +
  labs(y = "proportion")

p1 <- plot_ly(starwars, type='bar', x = ~species, y = ~sex)
p1
## Warning: Ignoring 4 observations

Boxplots

hde <- starwars %>% 
  subset(species == "Human" | species == "Droid" | species == "Ewok")

ggplot(hde, aes(species, mass)) + 
  geom_boxplot()

Facetting

hd <- starwars %>% 
  subset(species == "Human" | species == "Droid")
hd <- hd %>% 
  select(name, height, mass, species)
hd_g <- hd %>% 
  gather(key = "measurement", value = "value", -name, -species)

ggplot(hd_g, aes(species, value)) + 
  geom_boxplot() + 
  facet_grid(~measurement)

Other

GGplot is based on the grammar of graphics Plotting data becomes consistent, flexible, specific, complete, and more when using ggplot.

SF and Leaflet

“Package sf represents simple features as native R objects”. In other words, sf is the “geopandas” or “arcpy” of R. It allows the user to manipulate spatial objects within dataframes. Paired with an open source mapping software like leaflet or mapview, the user can create neat maps and visualizations of the spatial data. Or you can simply use the plot() function for a quick view of your data.

SF Functions

Here are some of the common functions within the sf package.

methods(class = "sf")
##   [1] $<-                          [                           
##   [3] [[<-                         aggregate                   
##   [5] anti_join                    arrange                     
##   [7] as.data.frame                cbind                       
##   [9] coerce                       dbDataType                  
##  [11] dbWriteTable                 distinct                    
##  [13] dplyr_reconstruct            filter                      
##  [15] full_join                    gather                      
##  [17] group_by                     group_split                 
##  [19] identify                     initialize                  
##  [21] inner_join                   left_join                   
##  [23] merge                        mutate                      
##  [25] nest                         pivot_longer                
##  [27] pivot_wider                  plot                        
##  [29] print                        rbind                       
##  [31] rename                       right_join                  
##  [33] rowwise                      sample_frac                 
##  [35] sample_n                     select                      
##  [37] semi_join                    separate                    
##  [39] separate_rows                show                        
##  [41] slice                        slotsFromS3                 
##  [43] spread                       st_agr                      
##  [45] st_agr<-                     st_area                     
##  [47] st_as_s2                     st_as_sf                    
##  [49] st_as_sfc                    st_bbox                     
##  [51] st_boundary                  st_buffer                   
##  [53] st_cast                      st_centroid                 
##  [55] st_collection_extract        st_convex_hull              
##  [57] st_coordinates               st_crop                     
##  [59] st_crs                       st_crs<-                    
##  [61] st_difference                st_drop_geometry            
##  [63] st_filter                    st_geometry                 
##  [65] st_geometry<-                st_inscribed_circle         
##  [67] st_interpolate_aw            st_intersection             
##  [69] st_intersects                st_is                       
##  [71] st_is_valid                  st_join                     
##  [73] st_line_merge                st_m_range                  
##  [75] st_make_valid                st_minimum_rotated_rectangle
##  [77] st_nearest_points            st_node                     
##  [79] st_normalize                 st_point_on_surface         
##  [81] st_polygonize                st_precision                
##  [83] st_reverse                   st_sample                   
##  [85] st_segmentize                st_set_precision            
##  [87] st_shift_longitude           st_simplify                 
##  [89] st_snap                      st_sym_difference           
##  [91] st_transform                 st_triangulate              
##  [93] st_union                     st_voronoi                  
##  [95] st_wrap_dateline             st_write                    
##  [97] st_z_range                   st_zm                       
##  [99] summarise                    transform                   
## [101] transmute                    ungroup                     
## [103] unite                        unnest                      
## see '?methods' for accessing help and source code

North Carolina

Here we read in the example file of north carolina.

nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source 
##   `C:\Users\cday\AppData\Local\Programs\R\R-4.2.1\library\sf\shape\nc.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS:  NAD27
class(nc)
## [1] "sf"         "data.frame"
print(nc[9:15], n = 3)
## Simple feature collection with 100 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS:  NAD27
## First 3 features:
##   BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1  1091     1      10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
## 2   487     0      10   542     3      12 MULTIPOLYGON (((-81.23989 3...
## 3  3188     5     208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
par(mar = c(0,0,1,0))
plot(nc[1], reset = FALSE)  

par(mar = rep(0,4))
u <- st_union(nc)
plot(u)

Leaflet

leaflet() %>%
  addTiles() %>%
  addMarkers(lng=77.1025, lat=28.7041, 
             popup="Delhi, India")
leaflet(nc) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(color = "green", popup = paste0(
    "<b>NAME: </b>", nc$NAME, "<br>",
    "<b>AREA: <b/>", nc$AREA, "<br>"
  ))
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD27 +no_defs).
## Need '+proj=longlat +datum=WGS84'

Shiny

“Shiny is an R package that enables building interactive web applications that can execute R code on the backend. With Shiny, you can host standalone applications on a webpage, embed interactive charts in R Markdown documents, or build dashboards. You can also extend your Shiny applications with CSS themes, HTML widgets, and JavaScript actions.”

Example: https://gpilgrim.shinyapps.io/SwimmingProject-Click/?_ga=2.128768972.1778870320.1676330526-1135307234.1655994064

Example Code: https://github.com/gpilgrim2670/SwimMap/blob/master/app.R

Chris Example: https://christopher-day.shinyapps.io/bus_speeds_viewer/

Bookdown / Quarto

Bookdown is an open-source R package that can be used to write books, documentation, reports, articles and more with R Markdown. Some of the advantages are:

  • Generate printer-ready books/ebooks
  • Uses a markup language that’s easier to learn than LaTeX
  • Many output formats (PDF, LaTeX, HTML, EPUB, Word)
  • Can use interactive graphics
  • Supports many languages, not just R (like Python)
  • Can use LaTeX equations
  • GitHub compatible
  • RStudio integration
  • 1 click publishing

Example: https://bookdown.org/rdpeng/rprogdatascience/